## fragments.per.sample
## mean 44,955,206
## sd 6,398,756
## se 748,918
## median 43,343,558
## min 35,623,905
## max 64,917,188
##
## Shapiro-Wilk normality test
##
## data: log(fragments$read.total)
## W = 0.94823, p-value = 0.00465
## Analysis of Variance Table
##
## Response: log(read.total)
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 5 0.12749 0.025497 1.4563 0.2159
## Residuals 67 1.17306 0.017508
## # A tibble: 6 × 2
## treatment n
## <ord> <int>
## 1 3_amb 12
## 2 3_low 11
## 3 6_amb 14
## 4 6_low 13
## 5 10_amb 11
## 6 10_low 12
## # A tibble: 3 × 2
## temperature n
## <fct> <int>
## 1 3 23
## 2 6 27
## 3 10 23
## # A tibble: 2 × 2
## ph n
## <fct> <int>
## 1 amb 37
## 2 low 36
## # A tibble: 21 × 3
## # Groups: treatment [6]
## treatment tank n
## <ord> <fct> <int>
## 1 3_amb 19 12
## 2 3_low 13 3
## 3 3_low 14 3
## 4 3_low 21 2
## 5 3_low 22 3
## 6 6_amb 3 4
## 7 6_amb 4 4
## 8 6_amb 17 3
## 9 6_amb 18 3
## 10 6_low 7 3
## # … with 11 more rows
## [1] "20,975"
## genes.per.sample
## mean 20,934
## sd 18
## se 2
## median 20,937
## min 20,852
## max 20,955
## temperature ph treatment tank treatment_tank LOC115539476 LOC115539709
## PCG001 10 amb 10_amb 1 10_amb_1 257 758
## PCG004 10 amb 10_amb 1 10_amb_1 228 741
## PCG010 10 amb 10_amb 1 10_amb_1 317 814
## PCG011 10 amb 10_amb 1 10_amb_1 333 837
## PCG015 10 amb 10_amb 1 10_amb_1 219 677
## PCG017 10 amb 10_amb 2 10_amb_2 270 482
## LOC115538781 abhd14a acy1 LOC115537228 LOC115537019 LOC115538651
## PCG001 586 1197 1626 2106 659 659
## PCG004 596 1381 1244 2402 544 625
## PCG010 664 1313 1691 2367 552 616
## PCG011 576 1629 1670 2555 911 571
## PCG015 630 1031 1245 2008 564 504
## PCG017 417 774 1224 1845 505 433
## LOC115538267 kbtbd12 ccdc36 c1h1orf159 LOC115538500 LOC115539723
## PCG001 54 1003 42 37 731 2841
## PCG004 177 743 42 53 948 3579
## PCG010 126 722 32 57 837 3088
## PCG011 74 646 44 71 688 2732
## PCG015 143 758 53 73 623 3212
## PCG017 97 714 20 75 675 4046
## LOC115537157 usp48 LOC115537633 LOC115539576 atxn7l2
## PCG001 1 1843 254 274 277
## PCG004 7 2320 258 288 324
## PCG010 12 2328 324 281 339
## PCG011 12 1881 259 245 253
## PCG015 10 1872 281 254 277
## PCG017 8 2260 400 284 292
## [1] TRUE
NOTE: this was generated by the pheatmap function in the previous code chunk
## [1] TRUE
NOTE: Hover over points to see the sample numbers
This PCA also suggests that two samples - 137 and 194 - are outliers. Consider removing.
This enables screeplot to ID significant PCs, calculates variance, etc.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Variance(eigenvalue) 814.575515 170.8781408 132.5304807 96.9091562 72.7767275
## Proportion of Variance 0.383931 0.0805394 0.0624651 0.0456759 0.0343016
## Cumulative Proportion 0.383931 0.4644704 0.5269356 0.5726114 0.6069130
## Broken-stick value 4.874509 3.8745088 3.3745088 3.0411755 2.7911755
## PC6 PC7 PC8 PC9 PC10
## Variance(eigenvalue) 63.3710808 53.0599813 44.2083670 33.2200195 27.5233513
## Proportion of Variance 0.0298685 0.0250086 0.0208366 0.0156575 0.0129725
## Cumulative Proportion 0.6367815 0.6617901 0.6826266 0.6982841 0.7112566
## Broken-stick value 2.5911755 2.4245088 2.2816516 2.1566516 2.0455405
## PC11 PC12 PC13 PC14 PC15
## Variance(eigenvalue) 26.3006437 25.1253462 22.3765405 20.8120895 20.270432
## Proportion of Variance 0.0123962 0.0118422 0.0105467 0.0098093 0.009554
## Cumulative Proportion 0.7236528 0.7354950 0.7460417 0.7558510 0.765405
## Broken-stick value 1.9455405 1.8546314 1.7712981 1.6943750 1.622947
## PC16 PC17 PC18 PC19 PC20
## Variance(eigenvalue) 18.3470894 17.8624428 17.1131327 16.3363877 15.8122209
## Proportion of Variance 0.0086475 0.0084190 0.0080659 0.0076998 0.0074527
## Cumulative Proportion 0.7740525 0.7824715 0.7905374 0.7982371 0.8056899
## Broken-stick value 1.5562798 1.4937798 1.4349563 1.3794007 1.3267691
## PC21 PC22 PC23 PC24 PC25
## Variance(eigenvalue) 15.0447656 13.3316321 13.1757606 12.7745293 12.0961356
## Proportion of Variance 0.0070910 0.0062836 0.0062101 0.0060210 0.0057012
## Cumulative Proportion 0.8127809 0.8190644 0.8252745 0.8312955 0.8369967
## Broken-stick value 1.2767691 1.2291501 1.1836955 1.1402173 1.0985506
## PC26 PC27 PC28 PC29 PC30
## Variance(eigenvalue) 11.9222014 11.6971429 11.1118224 10.7528323 10.3148264
## Proportion of Variance 0.0056192 0.0055132 0.0052373 0.0050681 0.0048617
## Cumulative Proportion 0.8426159 0.8481291 0.8533664 0.8584345 0.8632962
## Broken-stick value 1.0585506 1.0200891 0.9830520 0.9473377 0.9128550
## PC31 PC32 PC33 PC34 PC35
## Variance(eigenvalue) 9.8561080 9.8032776 9.6599644 9.4759712 9.1288228
## Proportion of Variance 0.0046454 0.0046205 0.0045530 0.0044663 0.0043027
## Cumulative Proportion 0.8679416 0.8725622 0.8771152 0.8815814 0.8858841
## Broken-stick value 0.8795217 0.8472636 0.8160136 0.7857106 0.7562988
## PC36 PC37 PC38 PC39 PC40
## Variance(eigenvalue) 8.8696675 8.7441125 8.6088049 8.3940185 8.2364842
## Proportion of Variance 0.0041805 0.0041213 0.0040576 0.0039563 0.0038821
## Cumulative Proportion 0.8900646 0.8941859 0.8982435 0.9021998 0.9060819
## Broken-stick value 0.7277274 0.6999496 0.6729226 0.6466068 0.6209657
## PC41 PC42 PC43 PC44 PC45
## Variance(eigenvalue) 8.0999799 7.9089651 7.8861890 7.7561356 7.5987719
## Proportion of Variance 0.0038177 0.0037277 0.0037170 0.0036557 0.0035815
## Cumulative Proportion 0.9098996 0.9136273 0.9173443 0.9210000 0.9245815
## Broken-stick value 0.5959657 0.5715755 0.5477660 0.5245102 0.5017829
## PC46 PC47 PC48 PC49 PC50
## Variance(eigenvalue) 7.4077481 7.1665983 7.1330399 6.9318846 6.8485534
## Proportion of Variance 0.0034915 0.0033778 0.0033620 0.0032672 0.0032279
## Cumulative Proportion 0.9280729 0.9314507 0.9348127 0.9380799 0.9413078
## Broken-stick value 0.4795607 0.4578215 0.4365449 0.4157116 0.3953034
## PC51 PC52 PC53 PC54 PC55
## Variance(eigenvalue) 6.7220797 6.6352222 6.4961766 6.4286927 6.3415097
## Proportion of Variance 0.0031683 0.0031274 0.0030618 0.0030300 0.0029889
## Cumulative Proportion 0.9444761 0.9476035 0.9506653 0.9536953 0.9566842
## Broken-stick value 0.3753034 0.3556956 0.3364648 0.3175969 0.2990784
## PC56 PC57 PC58 PC59 PC60
## Variance(eigenvalue) 6.1071900 6.0698927 6.0035330 5.8990444 5.8974722
## Proportion of Variance 0.0028785 0.0028609 0.0028296 0.0027804 0.0027796
## Cumulative Proportion 0.9595627 0.9624236 0.9652532 0.9680336 0.9708132
## Broken-stick value 0.2808966 0.2630394 0.2454956 0.2282542 0.2113050
## PC61 PC62 PC63 PC64 PC65
## Variance(eigenvalue) 5.7345874 5.5844864 5.5057888 5.4439828 5.3123272
## Proportion of Variance 0.0027029 0.0026321 0.0025950 0.0025659 0.0025038
## Cumulative Proportion 0.9735161 0.9761482 0.9787432 0.9813091 0.9838130
## Broken-stick value 0.1946384 0.1782449 0.1621159 0.1462429 0.1306179
## PC66 PC67 PC68 PC69 PC70
## Variance(eigenvalue) 5.2219647 5.0468673 5.0410113 4.8607550 4.7943737
## Proportion of Variance 0.0024613 0.0023787 0.0023760 0.0022910 0.0022597
## Cumulative Proportion 0.9862742 0.9886530 0.9910289 0.9933199 0.9955796
## Broken-stick value 0.1152333 0.1000817 0.0851564 0.0704505 0.0559577
## PC71 PC72 PC73
## Variance(eigenvalue) 4.7460929 4.6324740 0.0000000
## Proportion of Variance 0.0022370 0.0021834 0.0000000
## Cumulative Proportion 0.9978166 1.0000000 1.0000000
## Broken-stick value 0.0416720 0.0275875 0.0136986
## Importance of components:
## [1] 46.44704
Another use of the transformed data is sample clustering. Here, we apply the dist function to the transpose of the transformed count matrix to get sample-to-sample distances.
A heatmap of this distance matrix gives us an overview over similarities and dissimilarities between samples. We have to provide a hierarchical clustering hc to the heatmap function based on the sample distances, or else the heatmap function would calculate a clustering based on the distances between the rows/columns of the distance matrix.
## [1] TRUE
This is the core DESeq2 analysis!
## [1] "Comparison: Ambient pH vs. Low pH, 3C"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 0, 0%
## LFC < 0 (down) : 2, 0.0095%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) by pH, 3C: 2"
## [1] "Comparison: Ambient pH vs. Low pH, 6C"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 2709, 13%
## LFC < 0 (down) : 1941, 9.3%
## outliers [1] : 0, 0%
## low counts [2] : 1220, 5.8%
## (mean count < 24)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) by pH, 6C: 4650"
## [1] "Comparison: Ambient pH vs. Low pH, 10C"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 54, 0.26%
## LFC < 0 (down) : 59, 0.28%
## outliers [1] : 0, 0%
## low counts [2] : 4067, 19%
## (mean count < 117)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) by pH, 10C: 113"
## [1] "Comparison: 3C vs. 6C, ambient pH"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 1241, 5.9%
## LFC < 0 (down) : 1404, 6.7%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 6C & 10C, ambient pH: 2645"
## [1] "Comparison: 3C vs. 10C, ambient pH"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 5518, 26%
## LFC < 0 (down) : 5941, 28%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 3C & 10C, ambient pH: 11459"
## [1] "Comparison: 6C vs. 10C, ambient pH"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 4720, 23%
## LFC < 0 (down) : 4328, 21%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 6C & 10C, ambient pH: 9048"
## [1] "Comparison: 3C vs. 6C, low pH"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 4281, 20%
## LFC < 0 (down) : 4381, 21%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 6C & 10C, low pH: 8662"
## [1] "Comparison: 3C vs. 10C, low pH"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 4943, 24%
## LFC < 0 (down) : 4938, 24%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 3C & 10C, low pH: 9881"
## [1] "Comparison: 6C vs. 10C, low pH"
##
## out of 20975 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 1277, 6.1%
## LFC < 0 (down) : 1067, 5.1%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "No. of genes differentially expressed (padj<0.05) between 6C & 10C, low pH: 2344"
## [1] 20975
## [1] 4741
## [1] 13052
## [1] 12480
## [1] 22.6
## [1] 62.2
## [1] 59.5
NOTE: There are 4,525 DEGs among pH treatment in the 6C group, but only 2 and 58 among the 3C and 10C groups, respectively. This is visible in the heat map- there are clear differences in expression in the 6_amb (green) and 6_low (yellow) treatments.
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] "Amb vs. Low pH @3C" "Amb vs. Low pH @6C" "Amb vs. Low pH @ 10C"
## [4] "3 vs. 6C @Amb pH" "3 vs. 10C @Amb pH" "6 vs. 10C @Amb pH"
## [7] "3 vs. 6C @Low pH" "3 vs. 10C @Low pH" "6 vs. 10C @Low pH"
It is a little clunky, but the DAVID functional analysis tool is one of the best ways to perform enrichment analysis (that I have found to date). To use, you copy a list of Uniprot IDs (or other gene identifier) to clipboard, then paste it into their tool. You then do the same for all genes in the analysis, providing an accurate background list of genes. Here is code to copy all genes, then upregulated and downregulated DEGs identified by each contrast.